Data Quality Checks

The Qualtrics team earmarked all of the problematic cases for us (duplicates, speeders, possible bots etc.). These have all been scrubbed. I’ve also conducted some cleaning of the raw data myself. There were a few little things that needed to be fixed up (e.g. respondents with age “1987” rather than “33”). In the end total sample size was 906. But not major concerns there.

To begin I have performed a few quick sanity checks for data quality, and to double check we didn’t have any troubling chance differences in major demographics between our 4 randomised groups.

Overall Survey Completion Times

All completed surveys that were returned with a completion time of less than half the median (i.e. less than 262 seconds) have been culled by the Qualtrics team. This is a standard procedure Qualtrics run to ensure data quality on all their online surveys. However, rather than use the completion time for the full survey as is usually done, I asked the Qualtrics team to run this from the first substantive question to the last substantive question, leaving out the time spent reading the article at the beginning of the survey. This was done so that we could safely use the same exclusion threshold for each the intervention groups as well as the control group who did not read an article (thus would have had a faster completion time).

Descriptive statistics for completion time are displayed in the table and plot below. For convenience I’ve included both seconds and minutes. The boxplot displays the median and IQR, alongside a histogram of counts and raw data (small crosses).

Time Min Max Median Mean SD
Seconds 262.000000 45462.0 576.0 833.48347 1848.79010
Minutes 4.366667 757.7 9.6 13.89139 30.81317

Group Sizes

Following these data cleaning measures we were left with a sample of 906 participants. Group sizes (n) were relatively similar after randomisation and data cleaning, and did not appear to be distorted by data cleaning measures. I performed a Bayesian multinomial regression to estimate the probability of being assigned to each group in the final data set.

Group Assignment
Group n Observed Prop. p .lower .upper .width .point .interval
Control 234 25.828 0.258 0.230 0.285 0.95 median hdi
Brain 228 25.166 0.252 0.224 0.279 0.95 median hdi
Design 224 24.724 0.247 0.220 0.275 0.95 median hdi
Clubs 220 24.283 0.243 0.216 0.271 0.95 median hdi

No obvious cause for alarm there. All posterior intervals overlapped p = 0.25, the value we would expect if groups were distributed symmetrically. A few more participants in the control group. This could be due to the fact that this condition was less onerous (no reading), but any such differences were small, and not reliable enough to worry about.

Here are the posterior intervals on a plot:

Demographics

Age and Gender

The sample was “soft” stratified to approximately balance the number of participants identifying as Male (436) and Female (468) against population data. Of the 2 remaining participants one self described as “non-binary”, and the other did not submit a text response.

Participants were aged between 18 and 89. To ensure a diverse range of age groups were recruited participants were also stratified to each of the age groups listed in the table below during sampling.

Age Quotas n
18-24 77
25-34 175
35-44 165
45-54 156
55-64 145
65+ 188

The women in our were younger on average (mean age = 44.83) than the men (mean age = 50.39).

A simple logistic regression suggested that Age predicted Gender, such that older participants in this sample were more likely to have identified as Male. If we wanted to investigate either age or gender as covariates, it might be best to consider both.

# This time modelling using the rethinking package:
m.GenderAge <- 
  ulam(
    alist(
      Male ~ dbern(p),
      logit(p) <- alpha + b*Age,
      # I'm using some fairly mild default priors
      # See McElreath's Statistical Rethinking Chapter 11 for an intro
      alpha ~ dnorm(0, 1.5),
      b ~ dnorm(0, 1)
    ), data = dat_list, cores = 8, chains = 8, iter = 1750, warmup = 500
  )

## Errors about initial values aren't something to worry about in this instance
gather_draws(m.GenderAge, alpha, b) |> 
  median_hdi() |> 
  kable(digits = 3, caption = "Logistic Regression: Gender by Age (LogOdds)")
Logistic Regression: Gender by Age (LogOdds)
.variable .value .lower .upper .width .point .interval
alpha -0.994 -1.368 -0.588 0.95 median hdi
b 0.019 0.012 0.027 0.95 median hdi

Let’s check the randomisation as a bit of a warm-up

Self-reported gender was approximately evenly distribution across groups:

m.GenderGroup <- 
  ulam(
    alist(
      Male ~ dbern(p),
      logit(p) <- alpha[Group],
      alpha[Group] ~ dnorm(0, 1.5)
    ), data = dat_list, cores = 8, chains = 8, iter = 1750, warmup = 500
  )
d |> 
  count(Group, Gender) |> 
  group_by(Group) |> 
  mutate(observed.prop = n/sum(n)) |> 
  filter(Gender == "Male") -> obs

post <- tidy_draws(m.GenderGroup)

# Long form
post <- 
  post |> 
    select(.draw, alpha.1:alpha.4) |> 
    pivot_longer(cols = alpha.1:alpha.4, names_to = "k", names_prefix = "alpha.", values_to = "alpha")

# Produce summary table
post |> 
  # Rename groups
  mutate(Group = factor(k, levels = 1:4, labels = levels(d$Group))) |> 
  # Compute probabilities (logistic transform)
  mutate(p = plogis(alpha)) |> 
  group_by(Group) |> 
  median_hdi(p) -> post.sum

left_join(obs, post.sum) |> 
  select(Group, n, observed.prop, p, .lower, .upper) |> 
  kable(digits = 3, caption = "Oberseved Proportions and Probability Estimates for Identifying as Male, by Group")
Oberseved Proportions and Probability Estimates for Identifying as Male, by Group
Group n observed.prop p .lower .upper
Control 108 0.462 0.466 0.404 0.528
Brain 109 0.478 0.478 0.415 0.544
Design 105 0.469 0.469 0.402 0.533
Clubs 114 0.518 0.523 0.460 0.590
post |> 
  mutate(Group = factor(k, levels = 1:4, labels = levels(d$Group))) |> 
  mutate(p = plogis(alpha)) |> 
  ungroup() |> 
  select(.draw, Group, p) |> 
  pivot_wider(names_from = Group, values_from = p) |> 
  mutate(across(Control:Clubs, ~.x / Control)) |> 
  pivot_longer(Control:Clubs, names_to = "Group", values_to = "OR") |> 
  mutate(Group = factor(Group, levels(d$Group))) |> 
  filter(Group != "Control") |> 
  group_by(Group) |> 
  median_hdi() |> 
  kable(digits = 2, caption = "OR vs Control for Male ID by Group")
OR vs Control for Male ID by Group
Group OR .lower .upper .width .point .interval
Brain 1.03 0.84 1.24 0.95 median hdi
Design 1.01 0.83 1.21 0.95 median hdi
Clubs 1.12 0.91 1.33 0.95 median hdi

Mean age did not differ substantially between experimental groups:

Model Parameters for Age by Group
name value .lower .upper .width .point .interval
alpha 48.19 46.20 50.06 0.95 median hdi
bBr -0.73 -3.46 1.98 0.95 median hdi
bCl -0.62 -3.31 2.22 0.95 median hdi
bD -1.26 -3.99 1.48 0.95 median hdi

State

Some slight differences in State:

# Predict NSW by Group ---------------------------------------------------------------------------
dat_list <- list(Group = as.integer(d$Group), NSW = ifelse(d$State2 == "New South Wales", TRUE, FALSE))

m.State.Group <- ulam(
  alist(
    NSW ~ bernoulli(p),
    logit(p) <- alpha[Group],
    alpha[Group] ~ normal(0, 1.5)
  ), data = dat_list, cores = 8, chains = 8, iter = 1750, warmup = 500
)

d |> 
  count(Group, State2) |> 
  group_by(Group) |> 
  mutate(observed.prop = n/sum(n)) |> 
  filter(State2 == "New South Wales") -> obs
post <- tidy_draws(m.State.Group)

# Long form
post <- 
  post |> 
    select(.draw, alpha.1:alpha.4) |> 
    pivot_longer(cols = alpha.1:alpha.4, names_to = "Group", names_prefix = "alpha.", values_to = "alpha")

post |> 
  mutate(Group = factor(Group, levels = 1:4, labels = levels(d$Group)), 
         p = plogis(alpha)) |> 
  group_by(Group) |> 
  median_hdi(p) -> post.sum

left_join(obs, post.sum) |> 
  select(Group, n, observed.prop, p, .lower, .upper) |> 
  kable(digits = 3, caption = "Oberseved Proportions and Probability Estimates for NSW Location, by Group")
Oberseved Proportions and Probability Estimates for NSW Location, by Group
Group n observed.prop p .lower .upper
Control 120 0.513 0.513 0.451 0.579
Brain 106 0.465 0.465 0.400 0.531
Design 92 0.411 0.411 0.348 0.475
Clubs 115 0.523 0.523 0.456 0.585

Socio-Economic

Participants were recruited from both NSW (433) and Victoria (473). We collected some simple demographic information about our sample, as displayed below:

Local Area n Percent
Major City 661 73.0
Inner Regional 144 15.9
Outer Regional 88 9.7
Remote 12 1.3
Very Remote 1 0.1
Education n Percent
Year 10 or below 83 9.2
Year 11 or equivalent 33 3.6
Year 12 or equivalent 133 14.7
A trade, technical certificate or diploma 227 25.1
Undergraduate university degree 268 29.6
Postgraduate degree 162 17.9
Employment n Percent
Yes, full time 339 37.4
Yes, part time 149 16.4
Yes, casual 44 4.9
No, not employed - looking for work 96 10.6
No, not employed - not looking for work 267 29.5
Currently stood-down 11 1.2

Gambling Involvement

PGSI

Our sample included a number of individuals who gambled regularly, and a number who reported symptoms indicative of varying risk of gambling related harm. Again the distribution of PGSI symptoms did not differ significantly between experimental groups.

Group No Harm Low Risk Moderate Risk High Risk
Control 162 25 17 30
Brain 151 26 20 31
Design 140 27 29 28
Clubs 147 12 25 36
Total 600 90 91 125
Test df Chi Squared p.value
Pearson’s Chi-squared test 9 12.8 0.17

Gambling Involvement

Likewise the number of individuals who had gambled at least once in the prior 12 months was (approximately) evenly distributed across experimental groups:

Group No Gambling Gambling
Control 149 85
Brain 162 66
Design 143 81
Clubs 147 73
Total 601 305
Test df Chi Squared p.value
Pearson’s Chi-squared test 3 3.66 0.3

Gambling Frequency

There were some issues with data quality in the responses to gambling frequency. For instance it seems a number of participants misunderstood the question time window: “On how many days in the last 12 months have you gambled money in each of the following ways.”

The 20 highest overall number below. Given there are only 365 days in a year some of these responses are clearly troubling. It’s possible that some individuals interpreted this as the total number of bets (such as spins on a poker machine, rather than the total number of days where betting occurred at least once). It’s also possible that these participants were responding at random, made keystroke errors or were satisficing.

Cards Racing SportsBetting Dice SkillGame Pokies Bingo CasinoGames LotteryScratch None
0 5000 2500 0 0 1500 0 1000 200 FALSE
0 1000 0 0 0 2000 0 6000 500 FALSE
0 0 0 1000 0 1500 0 0 0 FALSE
0 0 0 0 0 1500 0 0 500 FALSE
0 0 2000 0 0 0 0 0 0 FALSE
0 0 0 0 0 1549 0 0 0 FALSE
0 2 0 0 0 0 0 0 1220 FALSE
0 50 0 0 0 1000 20 0 10 FALSE
0 300 350 0 0 25 0 55 200 FALSE
200 0 200 0 0 0 200 0 200 FALSE
0 0 0 0 0 0 0 0 800 FALSE
0 600 0 0 0 0 0 0 120 FALSE
0 0 0 0 0 0 50 0 500 FALSE
0 25 0 0 0 0 0 0 500 FALSE
0 0 0 0 0 500 0 0 0 FALSE
0 0 0 0 0 500 0 0 0 FALSE
0 350 0 0 0 125 0 20 0 FALSE
0 150 0 0 0 20 0 0 300 FALSE
0 0 0 0 0 200 100 0 100 FALSE
0 200 200 0 0 0 0 0 0 FALSE

At this stage I’m not sure we plan to use this information, but good to be aware of.

Greatest Daily Expedidature

We also asked participants: “What is the largest amount of money you have ever gambled with in any one day?”. Unsurprisingly, responses to this questions were highly skewed. The highest reported lifetime daily expenditure was $1 million. I’ve drawn two plots below, capped at $2000 (200 bins) and $100 (101 bins), to display the spread:

There’s also a fairly classic preference for factors of 5. Given that distribution, and how skewed the means are it’s probably not a good idea to run parametric tests using these data. Again, I’ve no direct plans to use this information at this stage.

Group median mean sd
Control 24 256.3219 1036.670
Brain 20 707.3158 7515.908
Design 20 4788.7946 66814.097
Clubs 20 287.3409 1100.824

Article Reading Time

Participants in each group appeared to spend about the same amount of time reading in each condition. There is however a lot of variation in reading time overall, even if this doesn’t vary by group, it certainly does by participant.

Note that we set a timer on the page so participants could not press next until at least 45 seconds had passed. We also had 19 missing values on this measure.

Group Min 5% 25% Median 75% 95% Max Mean SD
Brain 63 73.1 103.0 149 235.0 551.2 4415 227.8 343.4
Design 62 70.8 106.0 164 245.8 411.2 1831 202.5 168.5
Clubs 59 71.0 106.2 153 223.8 524.8 4016 239.8 387.6

Across each of the above quantiles reading times were reasonably similar. Although, the median reading time in the Design group was slightly higher than the remaining two groups. Notably, there were more extreme observations (> 1000 seconds) out in the tail for the Brain and Clubs conditions:

d |>
  group_by(Group) |>
  filter(Group != "Control" & !is.na(SpeedInterv)) |>
  count(SpeedInterv > 600, Group) |> 
  kable()
Group SpeedInterv > 600 n
Brain FALSE 215
Brain TRUE 8
Design FALSE 210
Design TRUE 6
Clubs FALSE 204
Clubs TRUE 10

There is also some clear positive skew on each of these data distributions. This is unsurprising. We are measuring time. Measures of time commonly display this kind of positive skew. The measure is also bounded above 45 seconds (we set a minimum reading time on the article page for all condition). A standard regression isn’t going to cut it here. So I’m going to want to transform these data.

I might also need to windsorise some of the more extreme values. I could probably make a case for listwise deletion of these cases, 4000 seconds is over an hour. It seems unlikely that someone was that interested in the the content of the article. More likely they got up to do something else. But they may have nonetheless paid attention, and I don’t like deleting data without a clear indication that there was something clearly wrong with the cases.

Taking a quick look the failure rate on our comprehension check was about the same among these more extreme cases, as it was in the data set overall.

# 10 minutes
d |>
  group_by(Group, TotalCheck) |>
  filter(Group != "Control" & !is.na(SpeedInterv)) |>
  count(SpeedInterv > 600) |> 
  filter(`SpeedInterv > 600` == TRUE) |> 
  group_by(Group) |>
  mutate(prop = n/sum(n)) |> 
  kable()
Group TotalCheck SpeedInterv > 600 n prop
Brain Correct TRUE 7 0.8750000
Brain Failed TRUE 1 0.1250000
Design Correct TRUE 5 0.8333333
Design Failed TRUE 1 0.1666667
Clubs Correct TRUE 9 0.9000000
Clubs Failed TRUE 1 0.1000000
# 7 minutes
d |>
  group_by(Group, TotalCheck) |>
  filter(Group != "Control" & !is.na(SpeedInterv)) |>
  count(SpeedInterv > 420) |> 
  filter(`SpeedInterv > 420` == TRUE) |> 
  group_by(Group) |>
  mutate(prop = n/sum(n)) |> 
  kable()
Group TotalCheck SpeedInterv > 420 n prop
Brain Correct TRUE 17 0.8500000
Brain Failed TRUE 3 0.1500000
Design Correct TRUE 8 0.8888889
Design Failed TRUE 1 0.1111111
Clubs Correct TRUE 14 0.8750000
Clubs Failed TRUE 2 0.1250000

I’ll hold onto these cases rather than dropping them from the primary analysis.

Data Transform

As I mentioned I’ll transform these data prior to analysis to account for the positive skew. A log normal is likely a much better assumption than a normal. I will also standardise the data following the log transform. This will make life easier when it comes to setting priors. Finally, given that we know that data must be larger than 45 seconds, I’m going to shift the data back by 45 seconds, so that the measure represents time from the limit expiring.

Log transform

# I've switched off warnings and results output for this stan code.
m.RTZ <- ulam(
  alist(
    RT_z ~ dnorm(mu, sigma),
    mu <- alpha[G],
    alpha[G] ~ dnorm(0, 1), # Data have been standardised, which makes setting our prior fairly easy
    # Potential Unequal Variance (Equal variance not assumed).
    sigma <- s[G],
    s[G] ~ dexp(1)
  ), 
  data = dat_list,
  cores = 8, chains = 8, iter = 2750, warmup = 1000, log_lik = TRUE,
)
# Posterior draws
post <- tidy_draws(m.RTZ) |> select(.draw, alpha.1:s.3)

# Long form
post <- 
  post |> 
    pivot_longer(alpha.1:s.3, names_to = ".variable", values_to = ".value") |> 
    # Some regex to recover group and variable types
    mutate(
      Group = str_extract(string = .variable, pattern = "(?<=\\.)[:digit:]"),
      .variable = str_extract(string = .variable, pattern = "[:alpha:]*(?=\\.)"))

post |> 
  group_by(Group, .variable) |> 
  median_hdi(.value) |> 
  arrange(.variable) |> 
  mutate(Group = factor(Group, labels = c("Brain", "Design", "Clubs")),
         .variable = factor(.variable, levels = c("alpha", "s"), labels = c("Mean", "SD"))) |> 
  rename(param = .variable) |> 
  kable(digits = 2, caption = "Analysis of Mean Reading Time by Group")
Analysis of Mean Reading Time by Group
Group param .value .lower .upper .width .point .interval
Brain Mean 0.00 -0.13 0.13 0.95 median hdi
Design Mean 0.00 -0.13 0.13 0.95 median hdi
Clubs Mean 0.00 -0.14 0.14 0.95 median hdi
Brain SD 1.02 0.93 1.12 0.95 median hdi
Design SD 0.95 0.86 1.04 0.95 median hdi
Clubs SD 1.04 0.95 1.14 0.95 median hdi

Clearly no difference on the standardised scale for mean reading time by group. A little less variation in the Design condition, though our posterior interval still includes near null values:

post |> 
  filter(.variable == "s") |> 
  mutate(Group = factor(Group, levels = 1:3, labels = c("Brain", "Design", "Clubs"))) |> 
  pivot_wider(names_from = "Group", values_from = ".value") |> 
  mutate(across(c("Brain", "Clubs"), .fns = ~(.x - Design))) |> 
  ungroup() |> 
  select(.draw, Brain, Clubs) |> 
  pivot_longer(cols = Brain:Clubs) |> 
  group_by(name) |> 
  median_hdi() |> 
  kable(digits = 2, caption = "Difference in SD vs. Design Group")
Difference in SD vs. Design Group
name value .lower .upper .width .point .interval
Brain 0.07 -0.06 0.20 0.95 median hdi
Clubs 0.09 -0.04 0.22 0.95 median hdi

Posterior predictive check

A simple check of each of these sensitivity analyses is the posterior predictive check, which compares predictions from each model to the observed data. I’ve done this below:

# I'll write a function to get this done
get_ppc <- function(model, thisMany = 100) {
  d2 |> count(Group) -> ppc
  
  posterior <- extract.samples(model)
  
  ppc$G  <- 1:3
  
  ppc <- 
    ppc |> 
    mutate(
      sample = list(1:thisMany),
      alpha = purrr::map(
        .x = G,
        .f = function(.x) {
          posterior$alpha[1:thisMany, .x]
        }
      ),
      s = purrr::map(
        .x = G,
        .f = function(.x) {
          posterior$s[1:thisMany, .x]
        })) |> 
    unnest(everything()) |> 
    mutate(
      RT = purrr::pmap(
        .l = list(n = n, mean = alpha, sd = s),
        .f = rnorm
      )
    )
}

# Posterior predictive check:
ppc_Z <- get_ppc(m.RTZ)

back_to_sec <- function(x) {
  x <- ((x * sd(dat_list$RT_lg)) + mean(dat_list$RT_lg))
  x <- exp(x)
  return(x)
}

ppc_Z <- 
  ppc_Z |> 
  select(Group, n, G, sample, RT) |> 
  unnest(sample) |> 
  unnest(RT) |> 
  mutate(RT = back_to_sec(RT))
# Log transform ---------
ggplot() +
  theme_bw() +
  coord_cartesian(xlim = c(0, 1200)) +
  facet_wrap(~Group, nrow = 3) +
  geom_density(data = ppc_Z,
               mapping = aes(x = RT, group = sample),
               colour=alpha("hotpink1", 0.2)) +
  geom_density(data = d2,
               mapping = aes(x = RT, group = Group),
               colour = "black") +
  labs(subtitle = "Log-Transform and Standardised",
       caption = "Density plot of observed data displayed in black
                \nDenisty plots over predicted data in hot pink ;)",
       y = "Density")

Some nice looking fits.

To wrap up I’ll summarise a posterior check from the sample using the full set of posterior draws from the log-transform model, on the measurement scale:

ppc_Z <- get_ppc(model = m.RTZ, thisMany = 1e4)

ppc_Z <- ppc_Z |> unnest(RT)

ppc_Z <- 
  ppc_Z |> 
    mutate(RT = back_to_sec(RT)) |> 
    group_by(Group, sample) |> 
    summarise(
      ".05" = quantile(RT, probs = c(.05)),
      ".25" = quantile(RT, probs = c(.25)),
      ".5" = quantile(RT, probs = c(.5)),
      ".75" = quantile(RT, probs = c(.75)),
      ".95" = quantile(RT, probs = c(.95)))
ppc_Z <- 
  ppc_Z |> 
    ungroup() |> 
    group_by(Group) |> 
    summarise(across(!sample, .fns = list)) |> 
    mutate(across(!Group, .fns = ~purrr::map(.x = .x, median_hdi)))

ppc_Z <- 
  ppc_Z |>   
  mutate(across(!Group, .fns = ~purrr::map_chr(
    .x = .x, 
    .f = function(x) {
      x |> 
        select(y:ymax) |> 
        mutate(across(everything(), .fns = ~round(.x, 2))) |> 
        summarise(Est = paste0(y, " [",  ymin, ", ", ymax, "]")) |> 
        pull(Est)
    })))

ppc_Z |> 
  pivot_longer(cols = !Group, names_to = "Quantile") |> 
  pivot_wider(names_from = Group, values_from = value) |> 
  kable(digits = 2, caption = "Quantile summary of posterior distribution")
Quantile summary of posterior distribution
Quantile Brain Design Clubs
.05 25.27 [17.89, 33.2] 28.07 [20.39, 36.75] 24.56 [17, 32.52]
.25 59.93 [47.77, 72.65] 62.87 [50.77, 75.31] 59.27 [46.86, 72.59]
.5 110.15 [90.5, 132.79] 110.83 [91.99, 131.72] 110.55 [90.51, 134.13]
.75 203 [162.86, 246.58] 195.28 [157.45, 234.49] 205.35 [164.37, 253.91]
.95 481.71 [351.99, 646.58] 436.74 [325.66, 579.8] 496.23 [350.31, 667.02]

Overall I don’t see any reason to be concerned with these observations, any differences in reading times were small or negligible. Fewer extreme observations in the Design group. Windsorizing extreme scores may address this. But this is not a central concern, so long as we have good evidence that each group appeared to spend enough time reading the article.

Comprehension Check

I’ve saved the most important check for last. Following the intervention article participants were asked to respond to a simple comprehension check. This check asked participants to identify which of four sentences best described the article they had viewed. Each group were presented with the same 3 false answers, in addition to a unique question that correctly represented the article that had been viewed. 69.35 percent of participants responded correctly to the first check, 10.71 % asked to view the questions again, and 19.94% answered the question incorrectly. Those that failed or requested to view the article again were again shown the article page followed by another manipulation check question, that differed from the first. Overall 87.8 % of respondents answered the first or second manipulation check correctly, representing a high level of comprehension and attention.

d  |> 
  filter(Group != "Control") |>
  mutate(Group = factor(Group, levels = c("Brain", "Design", "Clubs"))) |> 
  select(Group, TotalCheck) |> 
  mutate(TotalCheck = TotalCheck == "Correct") -> dat_list

m.Attention <- ulam(
  alist(
    TotalCheck ~ bernoulli(p),
    logit(p) <- a[Group],
    a[Group] ~ dnorm(0, 1.5)
  ), data = dat_list, cores = 8, chains = 8, iter = 2750, warmup = 1000)  

post <- tidy_draws(m.Attention) |> select(.draw, a.1:a.3)

# Compute probs
post <- post |> mutate(across(a.1:a.3, .fns = plogis))
post <- 
  post |> 
    pivot_longer(cols = a.1:a.3, names_to = "Group", values_to = "p") |> 
    mutate(Group = str_extract(Group, pattern = "(?!>\\.)[:digit:]"),
           Group = factor(Group, labels = c("Design", "Brain", "Clubs")))

post |> 
  group_by(Group) |> 
  median_hdi(p) |> 
  kable(digits = 2)
Group p .lower .upper .width .point .interval
Design 0.87 0.82 0.91 0.95 median hdi
Brain 0.88 0.83 0.92 0.95 median hdi
Clubs 0.88 0.84 0.92 0.95 median hdi

The success rate on these measures is more or less exactly the same between experimental groups.

References

McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (2nd ed.). Taylor and Francis, CRC Press.

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27(5), 1413-1432.

Session Info

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.1
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] digest_0.6.29              tidybayes.rethinking_3.0.0
##  [3] tidybayes_3.0.2            posterior_1.2.0           
##  [5] rethinking_2.21            cmdstanr_0.4.0.9001       
##  [7] rstan_2.21.3               StanHeaders_2.21.0-7      
##  [9] janitor_2.1.0              broom_0.7.12              
## [11] knitr_1.37                 ggridges_0.5.3            
## [13] forcats_0.5.1              stringr_1.4.0             
## [15] dplyr_1.0.8                purrr_0.3.4               
## [17] readr_2.1.2                tidyr_1.2.0               
## [19] tibble_3.1.6               ggplot2_3.3.5             
## [21] tidyverse_1.3.1            datapasta_3.1.0           
## [23] here_1.0.1                
## 
## loaded via a namespace (and not attached):
##  [1] matrixStats_0.61.0   fs_1.5.2             lubridate_1.8.0     
##  [4] httr_1.4.2           rprojroot_2.0.2      tensorA_0.36.2      
##  [7] tools_4.1.2          backports_1.4.1      bslib_0.3.1         
## [10] utf8_1.2.2           R6_2.5.1             ggdist_3.1.0        
## [13] DBI_1.1.2            colorspace_2.0-3     withr_2.4.3         
## [16] tidyselect_1.1.2     gridExtra_2.3        prettyunits_1.1.1   
## [19] processx_3.5.2       compiler_4.1.2       cli_3.2.0           
## [22] rvest_1.0.2          HDInterval_0.2.2     arrayhelpers_1.1-0  
## [25] xml2_1.3.3           labeling_0.4.2       sass_0.4.0          
## [28] scales_1.1.1         checkmate_2.0.0      mvtnorm_1.1-3       
## [31] callr_3.7.0          rmarkdown_2.11       pkgconfig_2.0.3     
## [34] htmltools_0.5.2      highr_0.9            dbplyr_2.1.1        
## [37] fastmap_1.1.0        rlang_1.0.1          readxl_1.3.1        
## [40] rstudioapi_0.13      svUnit_1.0.6         shape_1.4.6         
## [43] jquerylib_0.1.4      generics_0.1.2       farver_2.1.0        
## [46] jsonlite_1.7.3       distributional_0.3.0 inline_0.3.19       
## [49] magrittr_2.0.2       loo_2.4.1            Rcpp_1.0.8          
## [52] munsell_0.5.0        fansi_1.0.2          abind_1.4-5         
## [55] lifecycle_1.0.1      stringi_1.7.6        yaml_2.3.5          
## [58] snakecase_0.11.0     MASS_7.3-55          pkgbuild_1.3.1      
## [61] plyr_1.8.6           grid_4.1.2           crayon_1.5.0        
## [64] lattice_0.20-45      haven_2.4.3          hms_1.1.1           
## [67] ps_1.6.0             pillar_1.7.0         codetools_0.2-18    
## [70] stats4_4.1.2         reprex_2.0.1         glue_1.6.1          
## [73] evaluate_0.15        data.table_1.14.3    RcppParallel_5.1.5  
## [76] modelr_0.1.8         vctrs_0.3.8          tzdb_0.2.0          
## [79] cellranger_1.1.0     gtable_0.3.0         assertthat_0.2.1    
## [82] xfun_0.29            coda_0.19-4          ellipsis_0.3.2